pixel-level annotation
EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations
We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short-and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality. In total, we publicly release 272K manual semantic masks of 257 object classes, 9.9M interpolated dense masks, 67K hand-object relations, covering 36 hours of 179 untrimmed videos.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Michigan (0.04)
- Europe > United Kingdom > England > Bristol (0.04)
TagGAN: A Generative Model for Data Tagging
Nawaz, Muhammad, Nasir, Basma, Zia, Tehseen, Hussain, Zawar, Moreira, Catarina
Precise identification and localization of disease-specific features at the pixel-level are particularly important for early diagnosis, disease progression monitoring, and effective treatment in medical image analysis. However, conventional diagnostic AI systems lack decision transparency and cannot operate well in environments where there is a lack of pixel-level annotations. In this study, we propose a novel Generative Adversarial Networks (GANs)-based framework, TagGAN, which is tailored for weakly-supervised fine-grained disease map generation from purely image-level labeled data. TagGAN generates a pixel-level disease map during domain translation from an abnormal image to a normal representation. Later, this map is subtracted from the input abnormal image to convert it into its normal counterpart while preserving all the critical anatomical details. Our method is first to generate fine-grained disease maps to visualize disease lesions in a weekly supervised setting without requiring pixel-level annotations. This development enhances the interpretability of diagnostic AI by providing precise visualizations of disease-specific regions. It also introduces automated binary mask generation to assist radiologists. Empirical evaluations carried out on the benchmark datasets, CheXpert, TBX11K, and COVID-19, demonstrate the capability of TagGAN to outperform current top models in accurately identifying disease-specific pixels. This outcome highlights the capability of the proposed model to tag medical images, significantly reducing the workload for radiologists by eliminating the need for binary masks during training.
Weakly Supervised Pixel-Level Annotation with Visual Interpretability
Nasir, Basma, Zia, Tehseen, Nawaz, Muhammad, Moreira, Catarina
Medical image annotation is essential for diagnosing diseases, yet manual annotation is time-consuming, costly, and prone to variability among experts. To address these challenges, we propose an automated explainable annotation system that integrates ensemble learning, visual explainability, and uncertainty quantification. Our approach combines three pre-trained deep learning models - ResNet50, EfficientNet, and DenseNet - enhanced with XGrad-CAM for visual explanations and Monte Carlo Dropout for uncertainty quantification. This ensemble mimics the consensus of multiple radiologists by intersecting saliency maps from models that agree on the diagnosis while uncertain predictions are flagged for human review. We evaluated our system using the TBX11K medical imaging dataset and a Fire segmentation dataset, demonstrating its robustness across different domains. Experimental results show that our method outperforms baseline models, achieving 93.04% accuracy on TBX11K and 96.4% accuracy on the Fire dataset. Moreover, our model produces precise pixel-level annotations despite being trained with only image-level labels, achieving Intersection over Union IoU scores of 36.07% and 64.7%, respectively. By enhancing the accuracy and interpretability of image annotations, our approach offers a reliable and transparent solution for medical diagnostics and other image analysis tasks.
EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations
We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality. In total, we publicly release 272K manual semantic masks of 257 object classes, 9.9M interpolated dense masks, 67K hand-object relations, covering 36 hours of 179 untrimmed videos.
Knowledge Extraction and Distillation from Large-Scale Image-Text Colonoscopy Records Leveraging Large Language and Vision Models
Wang, Shuo, Zhu, Yan, Luo, Xiaoyuan, Yang, Zhiwei, Zhang, Yizhe, Fu, Peiyao, Wang, Manning, Song, Zhijian, Li, Quanlin, Zhou, Pinghong, Guo, Yike
The development of artificial intelligence systems for colonoscopy analysis often necessitates expert-annotated image datasets. However, limitations in dataset size and diversity impede model performance and generalisation. Image-text colonoscopy records from routine clinical practice, comprising millions of images and text reports, serve as a valuable data source, though annotating them is labour-intensive. Here we leverage recent advancements in large language and vision models and propose EndoKED, a data mining paradigm for deep knowledge extraction and distillation. EndoKED automates the transformation of raw colonoscopy records into image datasets with pixel-level annotation. We validate EndoKED using multi-centre datasets of raw colonoscopy records (~1 million images), demonstrating its superior performance in training polyp detection and segmentation models. Furthermore, the EndoKED pre-trained vision backbone enables data-efficient and generalisable learning for optical biopsy, achieving expert-level performance in both retrospective and prospective validation.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Oncology > Colorectal Cancer (1.00)
- Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
- Health & Medicine > Diagnostic Medicine (1.00)
CARE: A Large Scale CT Image Dataset and Clinical Applicable Benchmark Model for Rectal Cancer Segmentation
Zhang, Hantao, Guo, Weidong, Qiu, Chenyang, Wan, Shouhong, Zou, Bingbing, Wang, Wanqin, Jin, Peiquan
Rectal cancer segmentation of CT image plays a crucial role in timely clinical diagnosis, radiotherapy treatment, and follow-up. Although current segmentation methods have shown promise in delineating cancerous tissues, they still encounter challenges in achieving high segmentation precision. These obstacles arise from the intricate anatomical structures of the rectum and the difficulties in performing differential diagnosis of rectal cancer. Additionally, a major obstacle is the lack of a large-scale, finely annotated CT image dataset for rectal cancer segmentation. To address these issues, this work introduces a novel large scale rectal cancer CT image dataset CARE with pixel-level annotations for both normal and cancerous rectum, which serves as a valuable resource for algorithm research and clinical application development. Moreover, we propose a novel medical cancer lesion segmentation benchmark model named U-SAM. The model is specifically designed to tackle the challenges posed by the intricate anatomical structures of abdominal organs by incorporating prompt information. U-SAM contains three key components: promptable information (e.g., points) to aid in target area localization, a convolution module for capturing low-level lesion details, and skip-connections to preserve and recover spatial information during the encoding-decoding process. To evaluate the effectiveness of U-SAM, we systematically compare its performance with several popular segmentation methods on the CARE dataset. The generalization of the model is further verified on the WORD dataset. Extensive experiments demonstrate that the proposed U-SAM outperforms state-of-the-art methods on these two datasets. These experiments can serve as the baseline for future research and clinical application development.
- Health & Medicine > Therapeutic Area > Oncology > Colorectal Cancer (1.00)
- Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
From slides (through tiles) to pixels: an explainability framework for weakly supervised models in pre-clinical pathology
Bertolini, Marco, Le, Van-Khoa, Pencharz, Jake, Poehlmann, Andreas, Clevert, Djork-Arné, Villalba, Santiago, Montanari, Floriane
In pre-clinical pathology, there is a paradox between the abundance of raw data (whole slide images from many organs of many individual animals) and the lack of pixel-level slide annotations done by pathologists. Due to time constraints and requirements from regulatory authorities, diagnoses are instead stored as slide labels. Weakly supervised training is designed to take advantage of those data, and the trained models can be used by pathologists to rank slides by their probability of containing a given lesion of interest. In this work, we propose a novel contextualized eXplainable AI (XAI) framework and its application to deep learning models trained on Whole Slide Images (WSIs) in Digital Pathology. Specifically, we apply our methods to a multi-instance-learning (MIL) model, which is trained solely on slide-level labels, without the need for pixel-level annotations. We validate quantitatively our methods by quantifying the agreements of our explanations' heatmaps with pathologists' annotations, as well as with predictions from a segmentation model trained on such annotations. We demonstrate the stability of the explanations with respect to input shifts, and the fidelity with respect to increased model performance. We quantitatively evaluate the correlation between available pixel-wise annotations and explainability heatmaps. We show that the explanations on important tiles of the whole slide correlate with tissue changes between healthy regions and lesions, but do not exactly behave like a human annotator. This result is coherent with the model training strategy.
- North America > United States (0.14)
- Europe > Germany > Berlin (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > France (0.04)
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (0.68)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (0.46)
Meta Mask Correction for Nuclei Segmentation in Histopathological Image
Shi, Jiangbo, Jia, Chang, Gao, Zeyu, Gong, Tieliang, Wang, Chunbao, Li, Chen
Nuclei segmentation is a fundamental task in digital pathology analysis and can be automated by deep learning-based methods. However, the development of such an automated method requires a large amount of data with precisely annotated masks which is hard to obtain. Training with weakly labeled data is a popular solution for reducing the workload of annotation. In this paper, we propose a novel meta-learning-based nuclei segmentation method which follows the label correction paradigm to leverage data with noisy masks. Specifically, we design a fully conventional meta-model that can correct noisy masks using a small amount of clean meta-data. Then the corrected masks can be used to supervise the training of the segmentation model. Meanwhile, a bi-level optimization method is adopted to alternately update the parameters of the main segmentation model and the meta-model in an end-to-end way. Extensive experimental results on two nuclear segmentation datasets show that our method achieves the state-of-the-art result. It even achieves comparable performance with the model training on supervised data in some noisy settings.
- Health & Medicine > Therapeutic Area > Oncology (0.68)
- Health & Medicine > Diagnostic Medicine > Imaging (0.49)
Transferable Semi-Supervised Semantic Segmentation
Xiao, Huaxin (National University of Defense Technology) | Wei, Yunchao (Beckman Institute, University of Illinois at Urbana-Champaign) | Liu, Yu (National University of Defense Technology) | Zhang, Maojun (National University of Defense Technology) | Feng, Jiashi (National University of Singapore)
The performance of deep learning based semantic segmentation models heavily depends on sufficient data with careful annotations. However, even the largest public datasets only provide samples with pixel-level annotations for rather limited semantic categories. Such data scarcity critically limits scalability and applicability of semantic segmentation models in real applications. In this paper, we propose a novel transferable semi-supervised semantic segmentation model that can transfer the learned segmentation knowledge from a few strong categories with pixel-level annotations to unseen weak categories with only image-level annotations, significantly broadening the applicable territory of deep segmentation models. In particular, the proposed model consists of two complementary and learnable components: a Label transfer Network (L-Net) and a Prediction transfer Network (P-Net). The L-Net learns to transfer the segmentation knowledge from strong categories to the images in the weak categories and produces coarse pixel-level semantic maps, by effectively exploiting the similar appearance shared across categories. Meanwhile, the P-Net tailors the transferred knowledge through a carefully designed adversarial learning strategy and produces refined segmentation results with better details. Integrating the L-Net and P-Net achieves 96.5% and 89.4% performance of the fully-supervised baseline using 50% and 0% categories with pixel-level annotations respectively on PASCAL VOC 2012. With such a novel transfer mechanism, our proposed model is easily generalizable to a variety of new categories, only requiring image-level annotations, and offers appealing scalability in real applications.
- Asia > China (0.04)
- North America > United States > Illinois (0.04)
- Asia > Singapore (0.04)